55 research outputs found
Distributional Learning of Variational AutoEncoder: Application to Synthetic Data Generation
The Gaussianity assumption has been consistently criticized as a main
limitation of the Variational Autoencoder (VAE) despite its efficiency in
computational modeling. In this paper, we propose a new approach that expands
the model capacity (i.e., expressive power of distributional family) without
sacrificing the computational advantages of the VAE framework. Our VAE model's
decoder is composed of an infinite mixture of asymmetric Laplace distribution,
which possesses general distribution fitting capabilities for continuous
variables. Our model is represented by a special form of a nonparametric
M-estimator for estimating general quantile functions, and we theoretically
establish the relevance between the proposed model and quantile estimation. We
apply the proposed model to synthetic data generation, and particularly, our
model demonstrates superiority in easily adjusting the level of data privacy
Joint Distributional Learning via Cramer-Wold Distance
The assumption of conditional independence among observed variables,
primarily used in the Variational Autoencoder (VAE) decoder modeling, has
limitations when dealing with high-dimensional datasets or complex correlation
structures among observed variables. To address this issue, we introduced the
Cramer-Wold distance regularization, which can be computed in a closed-form, to
facilitate joint distributional learning for high-dimensional datasets.
Additionally, we introduced a two-step learning method to enable flexible prior
modeling and improve the alignment between the aggregated posterior and the
prior distribution. Furthermore, we provide theoretical distinctions from
existing methods within this category. To evaluate the synthetic data
generation performance of our proposed approach, we conducted experiments on
high-dimensional datasets with multiple categorical variables. Given that many
readily available datasets and data science applications involve such datasets,
our experiments demonstrate the effectiveness of our proposed methodology
Causally Disentangled Generative Variational AutoEncoder
We present a new supervised learning technique for the Variational
AutoEncoder (VAE) that allows it to learn a causally disentangled
representation and generate causally disentangled outcomes simultaneously. We
call this approach Causally Disentangled Generation (CDG). CDG is a generative
model that accurately decodes an output based on a causally disentangled
representation. Our research demonstrates that adding supervised regularization
to the encoder alone is insufficient for achieving a generative model with CDG,
even for a simple task. Therefore, we explore the necessary and sufficient
conditions for achieving CDG within a specific model. Additionally, we
introduce a universal metric for evaluating the causal disentanglement of a
generative model. Empirical results from both image and tabular datasets
support our findings
Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms
Forecasting the water level of the Han river is important to control traffic
and avoid natural disasters. There are many variables related to the Han river
and they are intricately connected. In this work, we propose a novel
transformer that exploits the causal relationship based on the prior knowledge
among the variables and forecasts the water level at the Jamsu bridge in the
Han river. Our proposed model considers both spatial and temporal causation by
formalizing the causal structure as a multilayer network and using masking
methods. Due to this approach, we can have interpretability that consistent
with prior knowledge. In real data analysis, we use the Han river dataset from
2016 to 2021 and compare the proposed model with deep learning models
Restaurer la crédibilité des fondements juridiques et économiques de la stabilité financière: la nécessité d'incorporation des théories économiques?
To what extent can monetary and financial crises and cycles be explained through economic theories? This paper is aimed at highlighting why a reliance on economic theories may be necessary given certain flaws which have been revealed from the recent Financial Crisis. Namely, that economic and legal foundations of financial stability cannot always be considered to be credible.
Further, the paper aims to accentuate on why despite the valid argument (that a reference to economic theories may be required to explain causalities of financial and monetary crises), causalities could also be explained from other perspectives – even though these perspectives may sometimes, not be as accurate
Clustering for Regional Time Trend in the Nonstationary Extreme Distribution
Since the estimation of tail properties requires a stationarity of observations, it is necessary to develop a de-trending method not dependent on underlying distributions for nonstationary hydrological processes. Moreover, de-trending has been independently applied to hydrological processes, even though the processes are observed in geometrically adjacent sites. This paper presents a distribution-free de-trending method for nonstationary hydrological processes. Our method also provides clustered regional trends obtained by sparse regularization in a general distribution. It aggregates the parameter estimation and clustering within a unified framework. In the simulation study, our proposed method has superiority over other compared methods with respect to MSE and variance of coefficients. In real data analysis, the clustered trends of the annual maximum precipitation in the South Korean peninsula are reported, and the patterns of the estimated trends are visualized
Learning a High-dimensional Linear Structural Equation Model via l1-Regularized Regression
This paper develops a new approach to learning high-dimensional linear structural equation models (SEMs) without the commonly assumed faithfulness, Gaussian error distribution, and equal error distribution conditions. A key component of the algorithm is componentwise ordering and parent estimations, where both problems can be efficiently addressed using l(1)-regularized regression. This paper proves that sample sizes n = Omega(d(2) log p) and n = Omega(d(2)p(2/m)) are sufficient for the proposed algorithm to recover linear SEMs with subGaussian and (4m)-th bounded-moment error distributions, respectively, where p is the number of nodes and d is the maximum degree of the moralized graph. Further shown is the worst-case computational complexity O(n(p(3) + p(2d2))), and hence, the proposed algorithm is statistically consistent and computationally feasible for learning a high-dimensional linear SEM when its moralized graph is sparse. Through simulations, we verify that the proposed algorithm is statistically consistent and computationally feasible, and it performs well compared to the state-of-the-art US, GDS, LISTEN and TD algorithms with our settings. We also demonstrate through real COVID-19 data that the proposed algorithm is well-suited to estimating a virus-spread map in China.N
Statistical Road-Traffic Noise Mapping Based on Elementary Urban Forms in Two Cities of South Korea
Statistical models that can generate a road-traffic noise map for a city or area where only elementary urban design factors are determined, and where no concrete urban morphology, including buildings and roads, is given, can provide basic but essential information for developing a quiet and sustainable city. Long-term cost-effective measures for a quiet urban area can be considered at early city planning stages by using the statistical road-traffic noise map. An artificial neural network (ANN) and an ordinary least squares (OLS) model were developed by utilizing data on urban form indicators, based on a 3D urban model and road-traffic noise levels from a normal noise map of city A (Gwangju). The developed ANN and OLS models were applied to city B (Cheongju), and the resultant statistical noise map of city B was compared to an existing normal road-traffic noise map of city B. The urban form indicators that showed multi-collinearity were excluded by the OLS model, and among the remaining urban forms, road-related urban form indicators such as traffic volume and road area density were found to be important variables to predict the road-traffic noise level and to design a quiet city. Comparisons of the statistical ANN and OLS noise maps with the normal noise map showed that the OLS model tends to under-estimate road-traffic noise levels, and the ANN model tends to over-estimate them
A Projection of Extreme Precipitation Based on a Selection of CMIP5 GCMs over North Korea
The numerous choices between climate change scenarios makes decision-making difficult for the assessment of climate change impacts. Previous studies have used climate models to compare performance in terms of simulating observed climates or preserving model variability among scenarios. In this study, the Katsavounidis-Kuo-Zhang algorithm was applied to select representative climate change scenarios (RCCS) that preserve the variability among all climate change scenarios (CCS). The performance of multi-model ensemble of RCCS was evaluated for reference and future climates. It was found that RCCS was well suited for observations and multi model ensemble of all CCS. Using the RCCS under RCP (Representative Concentration Pathway) 8.5, the future extreme precipitation was projected. As a result, the magnitude and frequency of extreme precipitation increased towards the farther future. Especially, extreme precipitation (daily maximum precipitation of 20-year return-period) during 2070-2099, was projected to occur once every 8.3-year. The RCCS employed in this study is able to successfully represent the performance of all CCS, therefore, this approach can give opportunities managing water resources efficiently for assessment of climate change impacts
- …